Learning the semantics of structured data sources
نویسندگان
چکیده
Information sources such as relational databases, spreadsheets, XML, JSON, and Web APIs contain a tremendous amount of structured data that can be leveraged to build and augment knowledge graphs. However, they rarely provide a semantic model to describe their contents. Semantic models of data sources represent the implicit meaning of the data by specifying the concepts and the relationships within the data. Such models are the key ingredients to automatically publish the data into knowledge graphs. Manually modeling the semantics of data sources requires significant effort and expertise, and although desirable, building these models automatically is a challenging problem. Most of the related work focuses on semantic annotation of the data fields (source attributes). However, constructing a semantic model that explicitly describes the relationships between the attributes in addition to their semantic types is critical. We present a novel approach that exploits the knowledge from a domain ontology and the semantic models of previously modeled sources to automatically learn a rich semantic model for a new source. This model represents the semantics of the new source in terms of the concepts and relationships defined by the domain ontology. Given some sample data from the new source, we leverage the knowledge in the domain ontology and the known semantic models to construct a weighted graph that represents the space of plausible semantic models for the new source. Then, we compute the top k candidate semantic models and suggest to the user a ranked list of the semantic models for the new source. The approach takes into account user corrections to learn more accurate semantic models on future data sources. Our evaluation shows that our method generates expressive semantic models for data sources and services with minimal user input. These precise models make it possible to automatically integrate the data across sources and provide rich support for source discovery and service composition. They also make it possible to automatically publish semantic data into knowledge graphs.
منابع مشابه
Interrogation of a University Classrooms in the Court of Semantics: Managerial Implications
The purpose of this article, within the framework of an interpretive study, was to study the semantics of a universitychr('39')s classrooms to create a critical awareness of the meanings of the symptoms and their functions at the context of physical artifacts, besides their managerial implications. To accomplish this goal, after taking pictures of the structural elements of the studied classroo...
متن کاملResolving Structural Conflicts in the Integration of XML Schemas: A Semantic Approach
While the Internet has facilitated access to information sources, the task of scalable integration of these heterogeneous data sources remains a challenge. The adoption of the eXtensible Markup Language (XML) as the standard for data representation and exchange has led to an increasing number of XML data sources, both native and non-native. Recent integration work has mainly focused on developi...
متن کاملEfficient Learning of Semi-structured Data from Queries
This paper studies the learning complexity of classes of structured patterns for HTML/ XML-trees in the query learning framework of Angluin. We present polynomial time learning algorithms for ordered gapped tree patterns, OGT, and ordered gapped forests, OGF, under the into-matching semantics using equivalence queries and subset queries. As a corollary, the learnability with equivalence and mem...
متن کاملThe role of Persian causative markers in the acquisition of English causative verbs
This project investigates the relationship between lexical semantics and causative morphology in the acquisition of causative/inchoative-related verbs in English as a foreign language by Iranian speakers. Results of translation and picture judgment task show although L2 learners have largely acquired the correct lexico-syntactic classification of verbs in English, they were constrained by ...
متن کاملA Stylistic and Proficiency-based Approach to EFL Learners’ Performance Inconsistency
Performance deficiencies and inconsistencies among SLA or FL learners can be attributed to variety of sources including both systemic (i.e., language issues) and individual variables. Contrary to a rich background, the literature still suffers from a gap as far as delving into the issue from language proficiency and learning style is concerned. To fill the gap, this study addressed EFL learner...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- J. Web Sem.
دوره 37-38 شماره
صفحات -
تاریخ انتشار 2016